On Data Quality Assurance and Conflation Entanglement in Crowdsourcing for Environmental Studies

نویسندگان

  • Didier G. Leibovici
  • Julian F. Rosser
  • Crona Hodges
  • Barry Evans
  • Mike Jackson
  • Christopher I. Higgins
چکیده

Volunteer geographical information (VGI), either in the context of citizen science or the mining of social media, has proven to be useful in various domains including natural hazards, health status, disease epidemics, and biological monitoring. Nonetheless, the variable or unknown data quality due to crowdsourcing settings are still an obstacle for fully integrating these data sources in environmental studies and potentially in policy making. The data curation process, in which a quality assurance (QA) is needed, is often driven by the direct usability of the data collected within a data conflation process or data fusion (DCDF), combining the crowdsourced data into one view, using potentially other data sources as well. Looking at current practices in VGI data quality and using two examples, namely land cover validation and inundation extent estimation, this paper discusses the close links between QA and DCDF. It aims to help in deciding whether a disentanglement can be possible, whether beneficial or not, in understanding the data curation process with respect to its methodology for future usage of crowdsourced data. Analysing situations throughout the data curation process where and when entanglement between QA and DCDF occur, the paper explores the various facets of VGI data capture, as well as data quality assessment and purposes. Far from rejecting the usability ISO quality criterion, the paper advocates for a decoupling of the QA process and the DCDF step as much as possible while still integrating them within an approach analogous to a Bayesian paradigm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mentor: A Visualization and Quality Assurance Framework for Crowd-Sourced Data Generation

Crowdsourcing is a feasible method for collecting labeled datasets for training and evaluating machine learning models. Compared to the expensive process of generating labeled datasets using dedicated trained judges, the low cost of data generation in crowdsourcing environments enables researchers and practitioners to collect significantly larger amounts of data for the same cost. However, crow...

متن کامل

Perform Three Data Mining Tasks with Crowdsourcing Process

For data mining studies, because of the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. Uncertainty about the performance of virtual user...

متن کامل

Behavior-Based Quality Assurance in Crowdsourcing Markets

Quality assurance in crowdsourcing markets has appeared to be an acute problem over the last years. We propose a quality control method inspired by Statistical Process Control (SPC), commonly used to control output quality in production processes and characterized by relying on time-series data. Behavioral traces of users may play a key role in evaluating the performance of work done on crowdso...

متن کامل

Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing

Crowdsourcing is an effective tool for scalable data annotation in both research and enterprise contexts. Due to crowdsourcing’s open participation model, quality assurance is critical to the success of any project. Present methods rely on EM-style post-processing or manual annotation of large gold standard sets. In this paper we present an automated quality assurance process that is inexpensiv...

متن کامل

Worker Perception of Quality Assurance Mechanisms in Crowdsourcing and Human Computation Markets

Many human computation systems utilize crowdsourcing marketplaces to recruit workers. Because of the open nature of these marketplaces, requesters need to use appropriate quality assurance mechanisms to guarantee high quality results. Previous research has mostly focused on the statistical aspects of quality assurance. Instead, we analyze the worker perception of five quality assurance mechanis...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • ISPRS Int. J. Geo-Information

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2017